Exploring the relationship between sequence similarity and accurate phylogenetic trees.
نویسندگان
چکیده
We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.
منابع مشابه
A preliminary study on phylogenetic relationship between five sturgeon species in the Iranian Coastline of the Caspian Sea
The phylogenetic relationship of five sturgeon species in the South Caspian Sea was investigated using mtDNA molecule. Sequence analysis of mtDNA D-loop region of five sturgeon species [Great sturgeon (Huso huso), Russian sturgeon (Acipenser gueldenstaedtii), Persian sturgeon (Acipenser persicus), Ship sturgeon (Acipenser nudiventris), Stellate sturgeon (Acipenser stellatus)] and DNA sequencing...
متن کاملA preliminary study on phylogenetic relationship between five sturgeon species in the Iranian Coastline of the Caspian Sea
The phylogenetic relationship of five sturgeon species in the South Caspian Sea was investigated using mtDNA molecule. Sequence analysis of mtDNA D-loop region of five sturgeon species [Great sturgeon (Huso huso), Russian sturgeon (Acipenser gueldenstaedtii), Persian sturgeon (Acipenser persicus), Ship sturgeon (Acipenser nudiventris), Stellate sturgeon (Acipenser stellatus)] and DNA sequencing...
متن کاملMitochondrial DNA sequence-based phylogenetic relationship of Trichiurus lepturus (Perciformes: Trichiuridae) from the Persian Gulf
In this study, mitochondrial DNA analysis using 16S ribosomal DNA (rDNA) was performed to investigate the phylogeny relationship of Trichiurus lepturus in the Persian Gulf compared to the other investigated area. The amplification of 16S rDNA resulted in a product of 600 bp in all samples. The results showed that the isolated strain belongs to T. lepturus showing 42 divergence sites among the s...
متن کاملPhylogenetic and sequence analysis of the growth hormone gene of two sturgeons, Huso huso and Acipenser Gueldenstaedtii
In this study, the cDNA Growth Hormone (cGH) of the Belugasturgeon (Husohuso) and Russian sturgeon (Acipensergueldenstaedtii) were cloned and sequenced, and phylogenetic relationships were examined using nucleic acid and amino acid sequences. The nucleotide sequence of the Beluga GH has an open reading frame of 645 nucleotides encoding a protein 214 amino acid residues. The signal peptide cleav...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 23 11 شماره
صفحات -
تاریخ انتشار 2006